A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

نویسندگان

  • Chong Wang
  • David M. Blei
چکیده

Abstract The hierarchical Dirichlet process (HDP) has become an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to construct a flexible mixed-membership model where the number of components is determined by the data. As for most Bayesian nonparametric models, exact posterior inference is intractable—practitioners use Markov chain Monte Carlo (MCMC) or variational inference. Inspired by the split-merge MCMC algorithm for the Dirichlet process (DP) mixture model, we describe a novel split-merge MCMC sampling algorithm for posterior inference in the HDP. We study its properties on both synthetic data and text corpora. We find that split-merge MCMC for the HDP can provide significant improvements over traditional Gibbs sampling, and we give some understanding of the data properties that give rise to larger improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Smart-Dumb/Dumb-Smart Algorithm for Efficient Split-Merge MCMC

Split-merge moves are a standard component of MCMC algorithms for tasks such as multitarget tracking and fitting mixture models with unknown numbers of components. Achieving rapid mixing for split-merge MCMC has been notoriously difficult, and state-of-the-art methods do not scale well. We explore the reasons for this and propose a new split-merge kernel consisting of two sub-kernels: one combi...

متن کامل

Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models

This paper proposes a new efficient merge-split sampler for both conjugate and nonconjugate Dirichlet process mixture (DPM) models. These Bayesian nonparametric models are usually fit usingMarkov chain Monte Carlo (MCMC) or sequential importance sampling (SIS). The latest generation of Gibbs and Gibbs-like samplers for both conjugate and nonconjugate DPM models effectively update the model para...

متن کامل

Split-Merge Augmented Gibbs Sampling for Hierarchical Dirichlet Processes

The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be performed through a Gibbs sampler using the auxiliary variable method. We propose a splitmerge procedure to augment this method of inference, facilitating faster convergence. Whilst the incremental Gibbs sampler changes topic assignments of each word conditioned on the previous observations ...

متن کامل

Parallel Sampling of HDPs using Sub-Cluster Splits

We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and tha...

متن کامل

Parallel Sampling of DP Mixture Models using Sub-Clusters Splits

We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1201.1657  شماره 

صفحات  -

تاریخ انتشار 2012